EDLSI with PSVD Updating
نویسندگان
چکیده
This paper describes the results obtained from the merging of two techniques that provide improvements to search and retrieval using Latent Semantic Indexing (LSI): Essential Dimensions of LSI (EDLSI) and partial singular value decomposition (PSVD) updating. EDLSI utilizes an implementation of LSI that requires the use of only a few dimensions in the LSI space. The PSVD updating and folding-up algorithms can be used to incrementally maintain the integrity of the LSI space as new content is added. In this paper, we show that EDLSI works as expected with these updating techniques, maintaining retrieval performance and dramatically improving runtime performance. Folding-in is another technique for incorporating documents into an LSI space; however, it does not deliberately maintain the orthogonality of the document and term vectors, and thus retrieval performance usually suffers. Interestingly we find that combining EDLSI with folding-in results in retrieval performance that is very similar to that of standard LSI but at a dramatically reduced cost.
منابع مشابه
A New Adaptive Folding-up Algorithm for Information Retrieval
Text collections can be represented mathematically as term-document matrices. A term-document matrix can in turn be represented using the matrix factorization method known as the partial (or truncated) singular value decomposition (PSVD). Recomputing the PSVD when changes are made to a text collection is very expensive. Folding-in is one method of approximating the PSVD when new documents are a...
متن کاملTwo Uses for Updating the Partial Singular Value Decomposition in Latent Semantic Indexing
Latent Semantic Indexing (LSI) is an information retrieval (IR) method that connects IR with numerical linear algebra by representing a dataset as a term-document matrix. Because of the tremendous size of modern databases, such matrices can be very large. The partial singular value decomposition (PSVD) is a matrix factorization that captures the salient features of a matrix, while using much le...
متن کاملUpdating the partial singular value decomposition in latent semantic indexing
Latent semantic indexing (LSI) is a method of information retrieval that relies heavily on the partial singular value decomposition (PSVD) of the term-document matrix representation of a dataset. Calculating the PSVD of large term-document matrices is computationally expensive; hence in the case where terms or documents are merely added to an existing dataset, it is extremely beneficial to upda...
متن کاملDistributed EDLSI, BM25, and Power Norm at TREC 2008
This paper describes our participation in the TREC Legal competition in 2008. Our first set of experiments involved the use of Latent Semantic Indexing (LSI) with a small number of dimensions, a technique we refer to as Essential Dimensions of Latent Semantic Indexing (EDLSI). Because the experimental dataset is large, we designed a distributed version of EDLSI to use for our submitted runs. We...
متن کاملVentilatory decline after hypoxia and hypercapnia is not different between healthy young men and women.
The gradual decay in ventilation after removal of a respiratory stimulus has been proposed to protect against cyclic breathing disorders such as obstructive sleep apnea (OSA). The male predominance of OSA, and the increased incidence of OSA in women after menopause, indicates that the respiratory-stimulating effect of progesterone may provide protection against OSA by altering the rate of posts...
متن کامل